<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<!-- saved from url=(0030)https://speech2face.github.io/ -->
<html xmlns="http://www.w3.org/1999/xhtml"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8">

<title>
  Cross Modal Compression
</title>
<link href="./main/style.css" rel="stylesheet" type="text/css">
<!-- Global site tag (gtag.js) - Google Analytics -->
<script src="https://polyfill.io/v3/polyfill.min.js?features=es6"></script>
<script id="MathJax-script" async src="https://cdn.jsdelivr.net/npm/mathjax@3/es5/tex-mml-chtml.js"></script>
<script type="text/x-mathjax-config">
  MathJax.Hub.Config({
    tex2jax: {
      inlineMath: [ ['$','$'], ["\\(","\\)"] ],
      processEscapes: true
    }
  });
</script>

<script type="text/javascript"
    src="https://cdn.mathjax.org/mathjax/latest/MathJax.js?config=TeX-AMS-MML_HTMLorMML">
</script>
<script type="text/javascript">
LatexIT.add('p',true);
</script>
<script>
  window.dataLayer = window.dataLayer || [];
  function gtag(){dataLayer.push(arguments);}
  gtag('js', new Date());

  gtag('config', 'UA-65563403-4');
</script>
</head>
<body>
<div class="container">
  <p>&nbsp;</p>
  <!--
  <p><span class="venue">IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 2019</span></p>
  -->
  <p><span class="title">Cross Modal Compression: Towards Human-comprehensive Semantic Compression</span></p>
  <br>
  <table border="0" align="center" class="authors">
    <tbody><tr align="center" valign="bottom">
    <!--
      <td><a href="https://ai.google/research/people/InbarMosseri">Inbar Mosseri</a></td>
      <td><a href="https://billf.mit.edu/">William T. Freeman</a><sup>✝</sup></td>
      <td><a href="http://people.csail.mit.edu/mrub/">Michael Rubinstein<sup> </sup></a></td>
      <td><a href="http://people.csail.mit.edu/wojciech">Wojciech Matusik</a><sup>✝</sup></td>
    -->
    <td><a href="https://github.com/smallflyingpig">Jiguo Li</a><sup></sup></td>
    <td><a href="http://www.jiachuanmin.site/">Chuanmin Jia</a><sup>✝</sup></td>
    <td><a href="https://scholar.google.com/citations?user=KQB-cKAAAAAJ&hl=zh-CN">Xinfeng Zhang</a><sup></sup></td>
  <!--
    <td><a href="https://scholar.google.com/citations?user=x4iWZ7wAAAAJ&hl=en">Jizheng Xu</a><sup></sup></td>
    <td><a href="https://scholar.google.com/citations?user=8G5-2OMAAAAJ&hl=en">Li Zhang</a><sup></sup></td>
    <td><a href="">Yue Wang</a><sup></sup></td>
  -->
    <td><a href="http://www.idm.pku.edu.cn/Teamcon/index/id/456/aid/2831">Siwei Ma</a></td>
    <td><a href="http://www.jdl.ac.cn/htm-gaowen/">Wen Gao</a><sup></sup></td>
    </tr>
  </tbody></table>
  <br>
  <table border="0" align="center" class="affiliations" height="72">
    <tbody><tr align="center" valign="middle">

  <td align="right" style="padding:0 0px 0 0px;">
    <a href="http://www.ict.ac.cn/">
      <img src="./image/cnplogo.jpg"  height="36" alt="">
    </a>
    </td>
    <td align="left" style="padding:0 20px 0 0px;">
      <a href="http://www.ict.ac.cn/">ICT,CAS</a>
    </td>
  <td align="right" style="padding:0 0px 0 0px;"> 
    <a href="https://www.pku.edu.cn/">
      <img src="./image/Peking_University.jpg" height="36" alt="">
    </a>
    </td>
    <td align="center" style="padding:0 20px 0 0px;">
      <sup>✝</sup><a href="https://www.pku.edu.cn/">PKU</a>
    </td>

    <td align="right" style="padding:0 0px 0 0px;"> 
        <a href="http://www.ucas.ac.cn/">
          <img src="./image/UCAS_logo.jpg" height="36" alt="">
        </a>
        </td>
        <td align="center" style="padding:0 20px 0 0px;">
          <a href="http://www.ucas.ac.cn/">UCAS</a>
        </td>
<!--
  <td align="right" style="padding:0 0px 0 0px;">
    <a href="https://bytedance.com/zh">
      <img src="./image/ByteDance_Logo.jpg"  height="36" alt="">
    </a>
    </td>
    <td align="left" style="padding:0 0px 0 0px;">
      <a href="https://bytedance.com/zh">Bytedance</a>
    </td>
  -->
    </tr>
  </tbody></table>
  <br>
  </table>

  <p><span class="section">Abstract</span> </p>
  <p>
    Traditional image/video compression aims to reduce the transmission/storage cost with signal fidelity as high as possible. However, with the increasing demand for machine analysis and semantic monitoring in recent years, semantic fidelity rather than signal fidelity is becoming another emerging concern in image/video compression. 
    With the recent advances in cross modal translation and generation, in this paper, we propose the cross modal compression (CMC), a semantic compression framework for visual data, to transform the high redundant visual data (such as image, video, etc.) into a compact, human-comprehensible domain (such as text, sketch, semantic map, attributions, etc.), while preserving the semantic. 
    Specifically, we first formulate the CMC problem as a rate-distortion optimization problem. Secondly, we investigate the relationship with the traditional image/video compression and the recent feature compression frameworks, showing the difference between our CMC and these prior frameworks. Then we propose a novel paradigm for CMC to demonstrate its effectiveness. The qualitative and quantitative results show that our proposed CMC can achieve encouraging reconstructed results with an ultrahigh compression ratio, showing better compression performance than the widely used JPEG baseline. 
    <br>&nbsp;<br>
  </p>

  <p><span class="section">Framework</span> </p>
  <table width=90% border="0" align="center">
      <tbody>
        <tr>
          <td class="aligncenter">
            <img src="./image/CMC_framework_out.png" width=100% alt="">
          </td>
        </tr>
        <tr>
            <td colspan="1" class="caption">
              <p>
                Illustration of our proposed Cross Modal Compression~(CMC) framework. The compressed representation in the compression domain is a compact, common, and human-comprehensible feature~(such as text, sketch, semantic map, attributions. etc.) which can be losslessly encoded into a bitstream. The whole framework consists of four parts: CMC encoder, CMC decoder, entropy encoder, and entropy decoder.
              <br>
            </p></td>
        </tr>
        <tr><br>&nbsp;<br></tr>
        <tr><br>&nbsp;<br></tr>
      </tbody>
  </table>


  <table width=90%  border="1" align="center">
    <tbody
    <tr>
        <th>Methods</th><th>Compression Ratio</th><th>Multi-task Analysis</th><th>Human-comprehensive</th><th>Frontend Load</th><th>Backend Load</th><th>Data Reconstruction</th>
    </tr>
<tr>
    <td>Traditional Compression</td><td>Middle</td><td>&#10004</td><td>&times</td><td>Middle</td><td>High</td><td>&#10004</td>
</tr>    <!-- rowspan合并该列的两个单元格，所以它的下一列将删除一个单元格-->

<tr>
  <td>Ultimate Feature Compression</td><td>High</td><td>&times</td><td>&times</td><td>High</td><td>Low</td><td>-</td>
</tr>
<tr>
  <td>Intermediate Feature Compression</td><td>High</td><td>&#10004</td><td>&times</td><td>Middle</td><td>Middle</td><td>-</td>
</tr>
<tr>
  <td>Croaa Modal Compression</td><td>High</td><td>&#10004</td><td>&#10004</td><td>Middle</td><td>Middle</td><td>&#10004</td>
</tr>
    </tbody>
    <caption>Comparison with related frameworks</caption>
    <tr><br>&nbsp;<br></tr>
</table>


<p><span class='section'>A Paradigm of Cross Modal Compression: Image-Text-Image for cross modal image compression</span></p>
<table width=90% border="0" align="center">
  <tbody>
    <tr>
      <td class="aligncenter">
        <img src="./image/CMC_framework_ITI.png" width=100% alt="">
      </td>
    </tr>
    <tr>
        <td colspan="1" class="caption" align="center">
          <p>
            Illustration for a paradigm of CMC: Image-Text-Image~(ITI)
          <br>
        </p></td>
    </tr>
  </tbody>
</table>
  <p><span class='section'>Qualitative results on CUB 200 and MS COCO</span></p>
  <table width=90% border="0" align="center">
    <tbody>
      <tr>
        <td class="aligncenter">
          <img src="./image/quanlity_results.png" width=100% alt="">
        </td>
      </tr>
      <tr>
          <td colspan="1" class="caption">
            <p>
              Qualitative results of our ITI framework on CUB-200-2011(left) and MS COCO(right). For each sample, we show the raw image, the text representation, and the reconstructed image, subsequently. We also show the <i>bitrate</i> and the <i>compression ratio</i> under each text.
            <br>
          </p></td>
      </tr>
      <tr><br>&nbsp;<br></tr>
    </tbody>
</table>



  <p><span class='section'>Comparison results with JPEG and JPEG2000</span></p>
  <table width=100% border="0" align="center">
      <tbody>

          <tr>
              <td class="aligncenter">
                <img src="./image/rate_IS_jpeg_j2k_coco.png" width=100% alt="">
              </td>
              <td class="aligncenter">
                <img src="./image/rate_FID_jpeg_j2k_bird.png" width=100% alt="">
                </td>
              <td class="aligncenter">
                <img src="./image/rate_IPD_jpeg_j2k_bird.png" width=100% alt="">
              </td>
              <td class="aligncenter">
                <img src="./image/rate_PSNR_jpeg_j2k_bird.png" width=100% alt="">
                </td>

            </tr>
            <tr>
              <td class="aligncenter">
                (h) Rate-IS &uarr; on MS COCO
              </td>
              <td class="aligncenter">
                (h) Rate-FID &darr; on MS COCO
                </td>
              <td class="aligncenter">
                (h) Rate-IPD &darr; on MS COCO
              </td>
              <td class="aligncenter">
                (h) Rate-PSNR &uarr; on MS COCO 
                </td>

            </tr>
            <tr>
              <td class="aligncenter">
                <img src="./image/rate_IS_jpeg_j2k_bird.png" width=100% alt="">
              </td>
              <td class="aligncenter">
                <img src="./image/rate_FID_jpeg_j2k_bird.png" width=100% alt="">
                </td>
              <td class="aligncenter">
                <img src="./image/rate_IPD_jpeg_j2k_bird.png" width=100% alt="">
              </td>
              <td class="aligncenter">
                <img src="./image/rate_PSNR_jpeg_j2k_bird.png" width=100% alt="">
                </td>

            </tr>
            <tr>
              <td class="aligncenter">
                (h) Rate-IS &uarr; on CUB 200
              </td>
              <td class="aligncenter">
                (h) Rate-FID &darr; on CUB 200
                </td>
              <td class="aligncenter">
                (h) Rate-IPD &darr; on CUB 200
              </td>
              <td class="aligncenter">
                (h) Rate-PSNR &uarr; on CUB 200
              </td>

            </tr>
            <tr><br>&nbsp;<br></tr>
      </tbody>
  </table>



  <p><span class='section'>Codes for This Paper</span></p>
  <p>The codes can be found on <a href="https://github.com/smallflyingpig/cross_modal_compression">github</a>.</p>



  



  <!--
  <p class="section">Google Research Blog</p>
  <table width="1300" border="0">
    <tbody>
      <tr>
        <td width="136"><img src="images/research_blog.png" width="200" height="131" alt=""/></td>
        <td width="1048"><a href="https://research.googleblog.com/2017/08/making-visible-watermarks-more-effective.html"><img src="images/blog_post.png" width="300" height="166" alt=""/></a></td>
      </tr>
    </tbody>
  </table>
  <p class="section">Press</p>
  <table border="0" cellpadding="10">
    <tbody>
      <tr>
        <td><a href="https://www.theverge.com/2017/8/18/16162108/google-research-algorithm-watermark-removal-photo-protection"><img src="images/the_verge_2016_logo.png" width="200" height="37" alt=""/></a></td>
        <td><a href="https://petapixel.com/2017/08/18/ai-can-easily-erase-photo-watermarks-heres-protect/"><img src="images/petapixel.png" width="200" height="50" alt=""/></a></td>
        <td><a href="https://www.engadget.com/2017/08/18/google-flawlessly-remove-stock-photo-watermarks/"><img src="images/engadget.png" width="200" height="44" alt=""/></a></td>
        <td><a href="https://www.wired.com/story/stock-photo-google-algorithm/"><img src="images/wired-logo.jpg" width="200" height="46" alt=""/></a></td>
      </tr>
      <tr>
        <td><a href="http://www.dailymail.co.uk/sciencetech/article-4803562/Google-AI-easily-erase-watermarks-photos.html"><img src="images/dm_com_29.png" width="210" height="62" alt=""/></a></td>
        <td><a href="https://thenextweb.com/google/2017/08/18/google-watermark-stock-photo-remove/"><img src="images/the_next_web_logo.jpg" width="190" height="100" alt=""/></a></td>
        <td>&nbsp;</td>
        <td>&nbsp;</td>
      </tr>
    </tbody>
  </table>
  <p class="section">&nbsp;</p>
  <p class="section">&nbsp;</p>
  <p class="section">&nbsp;</p>
-->
  <p class="section">&nbsp;</p>
</div>


</body></html>